Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition
نویسندگان
چکیده
A speech recognition system targeting high inflective languages is described that combines the traditional trigram language model and an HMM tagger, obtaining results superior to the trigram language model itself. An experiment in speech recognition of Czech has been performed with promising results. 1. Speech Recognition of Inflective Languages Inflective languages pose a hard problem in speech recognition due to two phenomena: highly inflective nature (causing data sparseness problem and excessive vocabulary growth), and free word order (causing the traditional speech recognition systems, such as n-gram Hidden Markov Models (HMMs) on word forms to be less accurate than for English). Specific methods targeting speech recognition of inflective languages have been already introduced in [1], [2] and [3]. The authors mainly focus on improving the language model by decomposing words from the vocabulary into stems and endings. This approach has mainly helped in reducing the size of the vocabulary of the speech recognizer reducing the WER slightly. 2. Combining Taggers with Language Models Tagger has been to our best knowledge first introduced as a speech recognition language model component in [4] without improving results over the baseline bigram model. The idea has been further explored in [5] where the author proposes the interpolation with a trigram model. P (W ) = λP (wi|wi−2, wi−1)+ (1 − λ)Q(wi|g(wi−2), g(wi−1)), (1) where g(wi) is the tagging function. The importance of formula (1) for languages with the data sparseness problem is that the new component Q can have enough evidence to give us reliable statistics about the word sequence W as the size of tag set tends to be much smaller then the size of the vocabulary itself. The problem with approach (1) is that the tagging function g(wi) depends on all words of the utterance (supposing that the tagging component is performed by an HMM tagger). The standard solution is to replace the probability Q by a new probability Q∗: Q∗(wi|w1, . . . , wi−1) = ∑ Q(wi|g1, g2)T (g(wi−2) = g2, g(wi−1) = g1) (2) The new probability T is the corresponding forward probability of the HMM tagger. The calculation of the forward probabilities is an seque words we do stand becom where proba are th that e inform us som the m the lis
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملPresentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition
Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملLearning Representations for Weakly Supervised Natural Language Processing Tasks
Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on partof-speech tagging and...
متن کاملPresentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition
Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003